Skip to content

Conversation

@ballista01
Copy link
Member

@ballista01 ballista01 commented Jul 16, 2025

Summary

  • Refactors EtcdClusterReconciler.Reconcile into focused phases to reduce cyclomatic complexity and make behavior easier to reason about.
  • Adds a reconcileState container so data gathered in earlier phases can be reused without repeated lookups.
  • Reworks the controller unit tests to target each phase individually with fake clients instead of the previous envtest-based flow.

Controller changes (internal/controller/etcdcluster_controller.go)

  • Split the main reconcile loop into:

    • fetchAndValidateState
    • bootstrapStatefulSet
    • performHealthChecks
    • reconcileClusterState

    Each returns early with a ctrl.Result when it needs a requeue or encounters an error.

  • Introduced reconcileState (cluster, StatefulSet, member list, member health) to pass context between phases.

  • fetchAndValidateState:

    • Sets a default image registry.
    • Prepares TLS credentials.
    • Verifies StatefulSet ownership.
    • Signals requeues when dependent resources aren’t yet ready.
  • bootstrapStatefulSet:

    • Unifies StatefulSet creation/initial scaling and the headless service check.
    • Uses ctrl.Result.IsZero() to decide whether to continue.
  • performHealthChecks and reconcileClusterState:

    • Now operate on memberHealth.
    • Added comments clarifying learner promotion, scale-out/-in, and health gating.

Tests (internal/controller/etcdcluster_controller_test.go)

  • Replaced the previous integration-style test with table-driven unit tests using controller-runtime’s fake client.

  • Added TestFetchAndValidateState and TestBootstrapStatefulSet, switching to require for hard preconditions and asserting on StatefulSet/Service/ConfigMap outcomes.

  • Expanded coverage for bootstrap scenarios:

    • Initial creation.
    • 0→1 promotion.
    • No-op when resources exist.
  • Verified cluster state remains untouched when nothing changes.

@k8s-ci-robot
Copy link

Hi @ballista01. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jberkus
Copy link
Contributor

jberkus commented Jul 22, 2025

/ok-to-test

- split the monolithic Reconcile into fetch/sync/health/cluster phases backed by a shared reconcileState

- ensure TLS setup, STS ownership checks, and primitive bootstrapping are driven through dedicated helpers

- refresh controller unit tests to exercise the new helpers and drop obsolete scaffolding

Signed-off-by: Wenxue Zhao <[email protected]>
@ivanvc
Copy link
Member

ivanvc commented Sep 30, 2025

I was reading the code. But the changes look a little bit lengthy. Do you think that it would be possible to break down this pull request?

@ballista01 ballista01 changed the title feat(controller, e2e): Refactor reconcile loop and add E2E tests for scaling and promotion feat(controller): Refactor reconcile loop and add unit tests Sep 30, 2025
Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The refactoring makes sense to me. Just raised several minor comments.

Thanks.

cc @ivanvc @ArkaSaha30 @abdurrehman107 @hakman @frederiko PTAL when you get free cycle, thx

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request, @ballista01. I left a couple of comments.

I also noticed that you removed some comments from lengthy functions. I think it makes sense to comment these functions, as it is easy to get lost with all the conditions they have. Thanks again :)

@ahrtr
Copy link
Member

ahrtr commented Oct 3, 2025

The change for internal/controller/etcdcluster_controller.go looks clean and better. Thanks. (But do not get much time to look into the test case.)

Let's target to merge this PR before our next WG meeting.

@ahrtr
Copy link
Member

ahrtr commented Oct 8, 2025

@ivanvc any further comments on this PR?

@ivanvc
Copy link
Member

ivanvc commented Oct 10, 2025

@ivanvc any further comments on this PR?

No, I just reviewed again, and if you're ok with this PR, we can merge it.

Copy link
Member

@ivanvc ivanvc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @ballista01.

Copy link
Member

@ahrtr ahrtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the great work!

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, ballista01, ivanvc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ahrtr ahrtr merged commit 49c0b4a into etcd-io:main Oct 10, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants